Integration Flow
- Create a Voice Session (HTTP): Obtain a unique
wsUrlandsessionId. - Stream Audio (WebSocket): Send base64-encoded PCM16 audio frames and handle inbound agent messages.
Prerequisites
- Agent ID: Your unique Lyzr identifier.
- Audio Format: Ability to produce 24kHz mono PCM16.
- Environment: Client must run on HTTPS for browser microphone access.
- Network Access: Ability to reach
POST https://voice-sip.voice.lyzr.app/session/start.
Important Rules
- URL Integrity: Always use the
wsUrlexactly as returned. Do not construct it yourself.- Encoding: Send audio as base64 of raw PCM16 bytes (not WAV, MP3, or float32).
- Sample Rate: Ensure your audio is actually 24kHz; resample if necessary.
1. Create a Session (HTTP)
Initialize the session by calling the Lyzr Voice SIP endpoint.- Method:
POST - URL:
https://voice-sip.voice.lyzr.app/session/start - Headers:
Content-Type: application/json
Example Request
Response Shape
wsUrl: Treat as an opaque URL; connect exactly as returned.audioConfig: Informational; assumes 24kHz mono PCM16.
2. WebSocket Implementation
Connection Lifecycle
- Graceful Shutdown: Stop microphone capture before closing the WebSocket.
- Reconnection: If the socket closes, call
session/startagain for a new URL. Do not reuse old URLs. - Keepalive: Send periodic “silence” frames (PCM16 zeros) to prevent idle disconnects if your platform doesn’t handle ping/pong.
Audio Pacing & Backpressure
- Chunk Duration: Aim for 20–100ms per message.
- Backpressure: Monitor
ws.bufferedAmountin browsers; if it climbs, throttle your sending speed. - Ready State: Only send data when
ws.readyState === WebSocket.OPEN.
Message Formats
Client → Service (Audio Frame)
Service → Client (Audio & Transcripts)
- Audio:
{ "type": "audio", "audio": "<base64>" }. - Transcript: JSON messages containing text, content, or roles (e.g.,
type: "transcript"). Treat transcript payloads defensively as shapes may vary.
Code Examples
Browser (TypeScript/WebAudio)
This captures microphone audio and converts it to the required format.Node.js (Backend Worker)
Use this if you are streaming pre-recorded audio or working from a server environment.Playback Notes
To play agent audio in the browser:- Decode: Base64-decode the
audiostring into aUint8Array. - Convert: Map
Int16bytes toFloat32(divide by 32768). - Play: Feed the resulting
Float32Arrayinto anAudioBufferset at 24,000 Hz.
Troubleshooting
- Distorted Audio: Ensure you are clamping samples to
[-1, 1]before PCM16 conversion. - Immediate Disconnect: Verify the
wsUrlis used exactly as provided and your agent ID is valid. - No Transcripts: Check all inbound message fields; transcript keys can vary by agent configuration.